集中式培训(CT)是许多受欢迎的多代理增强学习(MARL)方法的基础,因为它允许代理商快速学习高性能的政策。但是,CT依靠代理人从对特定州对其他代理商的行为的一次性观察中学习。由于MARL代理商在培训期间探索和更新其政策,因此这些观察结果通常会为其他代理商的行为和预期的给定行动回报提供不良的预测。因此,CT方法患有较高的差异和容易出错的估计,从而损害了学习。除非施加了强大的分解限制,否则CT方法还遭受了复杂性爆炸性增长(例如,QMIX的单调奖励函数)。我们通过一个新的半居中的MAL框架来应对这些挑战,该框架执行政策安装的培训和分散的执行。我们的方法是嵌入式增强学习算法(PERLA),是参与者批评的MARL算法的增强工具,它利用了一种新型参数共享协议和策略嵌入方法来维持对其他代理商的行为的估计。我们的理论证明,佩拉大大降低了价值估计的差异。与各种CT方法不同,Perla无缝地采用MARL算法,它可以轻松地与代理数量缩放,而无需限制性分解假设。我们展示了Perla在基准环境中的出色经验表现和有效的缩放,包括Starcraft Micromagement II和Multi-Agent Mujoco
translated by 谷歌翻译
强化学习(RL)被认为是在环境扰动下缺乏概括和鲁棒性,这过度限制了其对现实世界机器人技术的应用。先前的工作声称,将正则化添加到价值函数等同于学习不确定的稳健策略。尽管正规化的转换对其简单性和效率有吸引力,但它仍然缺乏连续的控制任务。在本文中,我们提出了一个名为$ \ textbf {u} $ nclectionty $ \ textbf {s} $ et $ et $ \ textbf {r} $ egularizer(usr)的新正常器功能。特别是,USR足够灵活,可以插入任何现有的RL框架中。为了处理未知的不确定性集,我们进一步提出了一种基于价值函数生成它们的新型对抗方法。我们在现实世界增强学习(RWRL)基准上评估了USR,这表明了扰动测试环境的稳健性能的改进。
translated by 谷歌翻译
负载预测在电力系统的分析和网格计划中至关重要。因此,我们首先提出一种基于联邦深度学习和非侵入性负载监测(NILM)的家庭负载预测方法。就我们所知,这是基于尼尔姆的家庭负载预测中有关联合学习(FL)的首次研究。在这种方法中,通过非侵入性负载监控将集成功率分解为单个设备功率,并且使用联合深度学习模型分别预测单个设备的功率。最后,将单个设备的预测功率值聚合以形成总功率预测。具体而言,通过单独预测电气设备以获得预测的功率,它可以避免由于单个设备的功率信号的强烈依赖性而造成的误差。在联邦深度学习预测模型中,具有权力数据的家主共享本地模型的参数,而不是本地电源数据,从而保证了家庭用户数据的隐私。案例结果表明,所提出的方法比直接预测整个汇总信号的传统方法提供了更好的预测效果。此外,设计和实施了各种联合学习环境中的实验,以验证该方法的有效性。
translated by 谷歌翻译
直接从图像中提取流体运动的信息具有挑战性。流体流量代表一个由Navier-Stokes方程控制的复杂动态系统。一般的光流法通常是为刚体运动设计的,因此如果直接应用于流体运动估计,则努力挣扎。此外,光流方法仅专注于两个连续的帧而不利用历史时间信息,而流体运动(速度场)可以被视为受时间依赖性偏微分方程(PDE)约束的连续轨迹。这种差异有可能引起身体上不一致的估计。在这里,我们提出了一种基于学习的预测校正方案,以进行流体流量估计。首先由PDE受限的光流预测器给出估计值,然后由基于物理的校正器来完善。与现有的基于基于学习的学习方法相比,所提出的方法比在基准数据集上的现有基于监督的学习方法相比,表现出竞争性结果。此外,所提出的方法可以推广到复杂的现实世界情景,在这种情况下,地面真理信息实际上是不可知的。最后,实验表明,物理校正器可以通过模仿通常在流体动力学模拟中使用的操作员分裂方法来完善流量估计。
translated by 谷歌翻译
本文介绍了电力网络的问题,可以为应用多功能增强学习(Marl)创造一个令人兴奋和挑战的现实情景。脱碳的新出现趋势在配电网络上放置过大的压力。主动电压控制被视为有希望的解决方案,以减轻电力拥塞和改善电压质量,无需额外的硬件投资,利用网络中的可控装置,例如屋顶光伏(PVS)和静态VAR补偿器(SVC)。这些可控设备出现在大量广大数字中,并分布在宽的地理区域中,使Marl成为自然候选者。本文在DEC-POMDP框架中制定了主动电压控制问题,并建立了开源环境。它旨在弥合电力社区与马尔社区之间的差距,并成为马尔算法实际应用的驱动力。最后,我们分析了主动电压控制问题的特殊特征,导致最先进的Marl方法挑战,并总结了潜在的方向。
translated by 谷歌翻译
Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory. We generalise Shapley value to Markov convex game called Markov Shapley value (MSV) and apply it as a value factorisation method in global reward game, which is obtained by the equivalence between the two games. Based on the properties of MSV, we derive Shapley-Bellman optimality equation (SBOE) to evaluate the optimal MSV, which corresponds to an optimal joint deterministic policy. Furthermore, we propose Shapley-Bellman operator (SBO) that is proved to solve SBOE. With a stochastic approximation and some transformations, a new MARL algorithm called Shapley Q-learning (SHAQ) is established, the implementation of which is guided by the theoretical results of SBO and MSV. We also discuss the relationship between SHAQ and relevant value factorisation methods. In the experiments, SHAQ exhibits not only superior performances on all tasks but also the interpretability that agrees with the theoretical analysis. The implementation of this paper is on https://github.com/hsvgbkhgbv/shapley-q-learning.
translated by 谷歌翻译
奖励成型(RS)是克服稀疏或不信息奖励问题的强大方法(RL)。但是,RS通常依赖于手动设计的成型奖励功能,其构造耗时且容易出错。它还需要与自主学习目标相反的领域知识。我们介绍了增强学习优化塑造算法(ROSA)的增强型,这是一个自动化的RS框架,其中塑造奖励函数是在两个代理之间的新型马尔可夫游戏中构建的。奖励塑料代理(Shaper)使用切换控件来确定在其他代理(控制器)使用这些形状奖励的任务中学习任务的最佳策略,以确定要添加形状奖励及其最佳值的状态。我们证明,Rosa很容易采用现有的RL算法,学会了构建针对任务的塑造奖励功能,从而确保有效地收敛到高性能策略。我们在三个经过精心设计的实验中展示了罗莎(Rosa)在挑战稀疏奖励环境中对最先进的RS算法的优越性能。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译